Class 3 –
11/5/2002
Special
Variables
These
are for your future reference! They are often very useful, but use
with care.
$_
The
default input and pattern-searching space. The following pairs are
equivalent:
while (<>) { ... }
while ($_ = <>) { ... }
/^Subject:/
$_ =~ /^Subject:/
y/a-z/A-Z/
$_ =~ y/a-z/A-Z/
chop
chop($_)
(Mnemonic:
underline is understood in certain operations.)
$.
The
current input line number of the last filehandle that was read.
Readonly. Remember that only an explicit close on the filehandle
resets the line number. (Mnemonic: many programs use . to mean the
current line number.)
$/
The
input record separator, newline by default. Works like awk's RS
variable, including treating blank lines as delimiters if set to the
null string. You may set it to a multicharacter string to match a
multi-character delimiter. Note that setting it to "\n\n"
means something slightly different than setting it to "",
if the file contains consecutive blank lines. Setting it to ""
will treat two or more consecutive blank lines as a single blank
line. Setting it to "\n\n" will blindly assume that the
next input character belongs to the next paragraph, even if it's a
newline. (Mnemonic: / is used to delimit line boundaries when quoting
poetry.)
$,
The
output field separator for the print operator. Ordinarily the print
operator simply prints out the comma separated fields you specify. In
order to get behavior more like awk, set this variable as you would
set awk's OFS variable to specify what is printed between fields.
(Mnemonic: what is printed when there is a , in your print
statement.)
$""
This
is like $, except that it applies to array values interpolated into a
double-quoted string (or similar interpreted string). Default is a
space. (Mnemonic: obvious, I think.)
$\
The
output record separator for the print operator. Ordinarily the print
operator simply prints out the comma separated fields you specify,
with no trailing newline or record separator assumed. In order to get
behavior more like awk, set this variable as you would set awk's ORS
variable to specify what is printed at the end of the print.
(Mnemonic: you set $\ instead of adding \n at the end of the print.
Also, it's just like /, but it's what you get "back" from
perl.)
$#
The
output format for printed numbers. This variable is a half-hearted
attempt to emulate awk's OFMT variable. There are times, however,
when awk and perl have differing notions of what is in fact numeric.
Also, the initial value is %.20g rather than %.6g, so you need to set
$# explicitly to get awk's value. (Mnemonic: # is the number sign.)
$%
The
current page number of the currently selected output channel.
(Mnemonic: % is page number in nroff.)
$=
The
current page length (printable lines) of the currently selected
output channel. Default is 60. (Mnemonic: = has horizontal lines.)
$-
The
number of lines left on the page of the currently selected output
channel. (Mnemonic: lines_on_page - lines_printed.)
$~
The
name of the current report format for the currently selected output
channel. Default is name of the filehandle. (Mnemonic: brother to
$^.)
$^
The
name of the current top-of-page format for the currently selected
output channel. Default is name of the filehandle with "_TOP"
appended. (Mnemonic: points to top of page.)
$|
If
set to nonzero, forces a flush after every write or print on the
currently selected output channel. Default is 0. Note that STDOUT
will typically be line buffered if output is to the terminal and
block buffered otherwise. Setting this variable is useful primarily
when you are outputting to a pipe, such as when you are running a
perl script under rsh and want to see the output as it's happening.
(Mnemonic: when you want your pipes to be piping hot.)
$$
The
process number of the perl running this script. (Mnemonic: same as
shells.)
$?
The
status returned by the last pipe close, backtick (\`\`) command or
system operator. Note that this is the status word returned by the
wait() system call, so the exit value of the subprocess is actually
($? >> 8). $? & 255 gives which signal, if any, the process
died from, and whether there was a core dump. (Mnemonic: similar to
sh and ksh.)
$&
The
string matched by the last successful pattern match (not counting any
matches hidden within a BLOCK or eval enclosed by the current BLOCK).
(Mnemonic: like & in some editors.)
$\`
The
string preceding whatever was matched by the last successful pattern
match (not counting any matches hidden within a BLOCK or eval
enclosed by the current BLOCK). (Mnemonic: \` often precedes a quoted
string.)
$'
The
string following whatever was matched by the last successful pattern
match (not counting any matches hidden within a BLOCK or eval
enclosed by the current BLOCK). (Mnemonic: ' often follows a quoted
string.) Example:
$_ = 'abcdefghi';
/def/;
print "$\`:$&:$'\n"; # prints abc:def:ghi
$+
The
last bracket matched by the last search pattern. This is useful if
you don't know which of a set of alternative patterns matched. For
example:
/Version: (.*)|Revision: (.*)/ && ($rev = $+);
(Mnemonic: be positive and forward looking.)
$*
Set
to 1 to do multiline matching within a string, 0 to tell perl that it
can assume that strings contain a single line, for the purpose of
optimizing pattern matches. Pattern matches on strings containing
multiple newlines can produce confusing results when $* is 0. Default
is 0. (Mnemonic: * matches multiple things.) Note that this variable
only influences the interpretation of ^ and $. A literal newline can
be searched for even when $* == 0.
$0
Contains
the name of the file containing the perl script being executed.
Assigning to $0 modifies the argument area that the ps(1) program
sees. (Mnemonic: same as sh and ksh.)
$<digit>
Contains
the subpattern from the corresponding set of parentheses in the last
pattern matched, not counting patterns matched in nested blocks that
have been exited already. (Mnemonic: like \digit.)
$[
The
index of the first element in an array, and of the first character in
a substring. Default is 0, but you could set it to 1 to make perl
behave more like awk (or Fortran) when subscripting and when
evaluating the index() and substr() functions. (Mnemonic: [ begins
subscripts.)
$]
The
string printed out when you say "perl -v". It can be used
to determine at the beginning of a script whether the perl
interpreter executing the script is in the right range of versions.
If used in a numeric context, returns the version + patchlevel /
1000. Example:
## see if getc is available
($version,$patchlevel) = $] =~ /(\d+\.\d+).*\nPatch level: (\d+)/;
print STDERR "(No filename completion available.)\n" if $version * 1000 + $patchlevel < 2016;
## or, used numerically,
warn "No checksumming!\n" if $] < 3.019;
(Mnemonic:
Is this version of perl in the right bracket?)
$;
The
subscript separator for multi-dimensional array emulation. If you
refer to an associative array element as
$foo{$a,$b,$c}
it really means
$foo{join($;, $a, $b, $c)}
But don't put
@foo{$a,$b,$c} # a slice--note the @
which means
($foo{$a},$foo{$b},$foo{$c})
Default
is "\034", the same as SUBSEP in awk. Note that if your
keys contain binary data there might not be any safe value for $;.
(Mnemonic: comma (the syntactic subscript separator) is a
semi-semicolon. Yeah, I know, it's pretty lame, but $, is already
taken for something more important.)
$!
If
used in a numeric context, yields the current value of errno, with
all the usual caveats. (This means that you shouldn't depend on the
value of $! to be anything in particular unless you've gotten a
specific error return indicating a system error.) If used in a string
context, yields the corresponding system error string. You can assign
to $! in order to set errno if, for instance, you want $! to return
the string for error n, or you want to set the exit value for the die
operator. (Mnemonic: What just went bang?)
$@
The
perl syntax error message from the last eval command. If null, the
last eval parsed and executed correctly (although the operations you
invoked may have failed in the normal fashion). (Mnemonic: Where was
the syntax error "at"?)
$<
The
real uid of this process. (Mnemonic: it's the uid you came FROM, if
you're running setuid.)
$>
The
effective uid of this process. Example:
$< = $>; # set real uid to the effective uid
($<,$>) = ($>,$<); # swap real and effective
uid
(Mnemonic: it's the uid you went TO, if you're running setuid.) Note:
$< and $> can only be swapped on machines supporting
setreuid().
$(
The
real gid of this process. If you are on a machine that supports
membership in multiple groups simultaneously, gives a space separated
list of groups you are in. The first number is the one returned by
getgid(), and the subsequent ones by getgroups(), one of which may be
the same as the first number. (Mnemonic: parentheses are used to
GROUP things. The real gid is the group you LEFT, if you're running
setgid.)
$)
The
effective gid of this process. If you are on a machine that supports
membership in multiple groups simultaneously, gives a space separated
list of groups you are in. The first number is the one returned by
getegid(), and the subsequent ones by getgroups(), one of which may
be the same as the first number. (Mnemonic: parentheses are used to
GROUP things. The effective gid is the group that's RIGHT for you, if
you're running setgid.)
Note:
$<, $>, $( and $) can only be set on machines that support the
corresponding set[re][ug]id() routine. $( and $) can only be swapped
on machines supporting setregid().
$:
The
current set of characters after which a string may be broken to fill
continuation fields (starting with ^) in a format. Default is "\
\n-", to break on whitespace or hyphens. (Mnemonic: a "colon"
in poetry is a part of a line.)
$^D
The
current value of the debugging flags. (Mnemonic: value of -D switch.)
$^F
The
maximum system file descriptor, ordinarily 2. System file descriptors
are passed to subprocesses, while higher file descriptors are not.
During an open, system file descriptors are preserved even if the
open fails. Ordinary file descriptors are closed before the open is
attempted.
$^I
The
current value of the inplace-edit extension. Use undef to disable
inplace editing. (Mnemonic: value of -i switch.)
$^L
What
formats output to perform a formfeed. Default is \f.
$^P
The
internal flag that the debugger clears so that it doesn't debug
itself. You could conceivable disable debugging yourself by clearing
it.
$^T
The
time at which the script began running, in seconds since the epoch.
The values returned by the -M , -A and -C filetests are based on this
value.
$^W
The
current value of the warning switch. (Mnemonic: related to the -w
switch.)
$^X
The
name that Perl itself was executed as, from argv[0].
$ARGV
contains
the name of the current file when reading from <>.
@ARGV
The
array ARGV contains the command line arguments intended for the
script. Note that $#ARGV is the generally number of arguments minus
one, since $ARGV[0] is the first argument, NOT the command name. See
$0 for the command name.
@INC
The
array INC contains the list of places to look for perl scripts to be
evaluated by the "do EXPR" command or the "require"
command. It initially consists of the arguments to any -I command
line switches, followed by the default perl library, probably
"/usr/local/lib/perl", followed by ".", to
represent the current directory.
%INC
The
associative array INC contains entries for each filename that has
been included via "do" or "require". The key is
the filename you specified, and the value is the location of the file
actually found. The "require" command uses this array to
determine whether a given file has already been included.
$ENV{expr}
The
associative array ENV contains your current environment. Setting a
value in ENV changes the environment for child processes.
$SIG{expr}
The
associative array SIG is used to set signal handlers for various
signals. Example:
sub handler { # 1st argument is signal name
local($sig) = @_;
print "Caught a SIG$sig--shutting down\n";
close(LOG);
exit(0);
}
$SIG{'INT'} = 'handler'; ## Call the handler function upon SIGINT
$SIG{'QUIT'} = 'handler'; ## Call the handler function upon SIGQUIT
...
$SIG{'INT'} = 'DEFAULT'; # restore default action
$SIG{'QUIT'} = 'IGNORE'; # ignore SIGQUIT
The
SIG array only contains values for the signals actually set within
the perl script.
Misc
file test operators (these don't belong here, but I needed to get
them in)
-x
A file test. This unary
operator takes one argument, either a filename or a filehandle, and
tests the associated file to see if something is true about it. If
the argument is omitted, tests $_, except for -t, which tests STDIN.
It returns 1 for true and '' for false, or the undefined value if the
file doesn't exist. Precedence is higher than logical and relational
operators, but lower than arithmetic operators. The operator may be
any of:
-r File is
readable by effective uid/gid.
-w File is
writable by effective uid/gid.
-x File is
executable by effective uid/gid.
-o File is owned
by effective uid.
-R File is
readable by real uid/gid.
-W File is
writable by real uid/gid.
-X File is
executable by real uid/gid.
-O File is owned
by real uid.
-e File exists.
-z File has zero
size.
-s File has
non-zero size (returns size).
-f File is a plain
file.
-d File is a
directory.
-l File is a
symbolic link.
-p File is a named
pipe (FIFO).
-S File is a
socket.
-b File is a block
special file.
-c File is a
character special file.
-u File has setuid
bit set.
-g File has setgid
bit set.
-k File has sticky
bit set.
-t Filehandle is
opened to a tty.
-T File is a text
file.
-B File is a
binary file (opposite of -T).
-M Age of file in
days when script started.
-A Same for access
time.
-C Same for inode
change time.
File
and Network I/O
Introduction
Since Perl was originally
written as a report generator, it's not surprising that it can
perform various I/O operations. In fact, since I/O is such an
important part of Perl, it has more magic than most other parts. The
special filehandles STDIN, STDOUT and STDERR come pre-opened, so you
don't need to open them to use them.
Opening
a File
You
can open a file for input or output using the open() function.
## Open a file for reading
open(INFILE, "input.txt") or quit("Can't open input.txt: $!", 1);
## Open a file for writing – in overwrite mode
open(OUTFILE, "> output.txt") or quit("Can't open output.txt: $!", 1);
## Open a file for writing – in append mode
open(LOGFILE, ">> my.log") or quit("Can't open logfile: $!", 1);
The first example above,
INFILE is the filehandle. By convention, filehandles are in
all-caps, to set them apart. The second argument specifies the
filename. By default, the file is opened for reading.
You can substitute any of
the “real” filenames with a variable:
%conf = ( “logFile”=> “/var/log/my.log” );
open(LOGFILE, ">>
$conf{'logFile'}") or quit("Can't open logfile: $!", 1);
binmode(FILEHANDLE)
Arranges for the file to
be read in "binary" mode in operating systems that
distinguish between binary and text files. Files that are not read in
binary mode have CR LF sequences translated to LF on input and LF
translated to CR LF on output. Binmode has no effect under Unix.
Reading
from a Filehandle
You can read from an open
filehandle using the <> (diamond) operator. In scalar context
it reads a single line from the filehandle, and in list context it
reads the whole file in, assigning each line to an element of the
list:
my $line = <INFILE>;
## $line is now the first (or next) line from INFILE
my @lines = <INFILE>;
## @lines is now a list of every line from INFILE
$lines[616] = “hi there”;
Reading in the whole file
at one time is called slurping. It can be useful but it can be a
memory hog. Most text file processing can be done a line at a time
with Perl's looping constructs.
The <> operator is
most often seen in a while loop:
while (<INFILE>) {
## assigns each line (one at a time) to $_
print "Just read in this line: $_";
}
..or when you need to
read one line from STDIN:
print “Do you want to continue? [y/n] “;
my $answer = <STDIN>;
if ($answer =~ /y/i) {
## Do something here
}
## Side note from class: getc() can be used to read a single
## character from a filehandle.
my $character = getc(INFILE);
## And the chomp function will remove any line ending characters (\n and \r)
## from any string.
$line = “blah\n”;
$line = chomp($line); ## $line now is “blah”
End of Files (EOF)
The eof function tests
end-of-file status. Normally, it is invoked as eof(FILEHANDLE), which
returns true if FILEHANDLE is currently at the end of file (i.e., if
the next read would return the undefined value).
If you omit the
FILEHANDLE argument, eof tests the last filehandle that was read
from.
Writing
to a Filehandle
We've already seen how to
print to standard output using print(). However, print() can also
take an optional first argument specifying which filehandle to print
to:
print STDERR "This is your final warning.\n"; ## Prints to STDERR
print OUTFILE “$record\n”; ## Prints to the OUTFILE filehandle
print LOGFILE $logmessage; ## Prints to the LOGFILE filehandle
Closing
a Filehandle
When you're done with
your filehandles, you should close() them (Perl will clean up after
you if you forget, but it's good practice to close your own
filehandles):
close(LOGFILE;
Outgoing
Network Sockets
One method of connecting
to a remote server is:
## Connect the the remote server 192.168.1.16 on TCP port 80
use IO::Socket;
my $server = “192.168.1.16”;
my $port = 80;
my $socket = IO::Socket::INET->new( PeerAddr => $server,
PeerPort => $port,
Proto => 'tcp',
Autoflush => 1,
Blocking => 1,
) or quit("$$ - $conf{'programName'} - ERROR - Connect to $server:$port failed. Error was: $!",1);
At this point $socket is a
network socket, exactly like a filehandle socket. Proto can be
either “tcp” or “udp”. AutoFlush specifies
that Perl should not buffer anything. By default Perl will buffer
connection streams (for network and file I/O) which can make it seem
like things aren't working when they really are. The Blocking
specifies whether or not to enable blocking reads from the
filehandle. The default is to enable blocking reads. There are more
options
Writing
to a network socket
This is exactly like
writing to any other filehandle:
## Say hi to the remote server
print $socket “Hi there.. want to talk?\r\n”;
Reading
from a network socket
This is exactly like
reading from any other filehandle:
## See what the server said
my $response = <$socket>;
You can
disconnect from a remote server by simply closing the filehandle.
Incoming
Network Sockets (writing server software)
One method of accepting
an incoming network connection is like this:
## Open a network port and listen for incoming connections
use IO::Socket;
my $socket = IO::Socket::INET->new( LocalPort => $port,
Proto => 'tcp',
Listen => 10,
Reuse => 1,
Autoflush => 1,
Blocking => 1,
) or quit("$$ - $conf{'programName'} - OS-ERROR - Failed to bind to tcp port $port. Error was: $!",1);
## Forever (while 1)
## 1. Accept an incoming connection
## 2. Read a message from the client
## 3. Send the client a nice message
## 4. Disconnect the client
while (1) {
my $clientSocket = $socket->accept() or quit("$$ - $conf{'programName'} - OS-ERROR - \$socket->accept failed: $!", 1);
my $message = <$clientSocket>;
print $clientSocket “You said: $message\r\n”;
close $clientSocket;
}
I/O Special Variables
As you might expect, there are
a number of special variables associated with I/O. They are:
$_
while (<FILEHANDLE>)
reads the next line into $_ by default.
$/
The input record separator.
This is a newline ("\n") by default.
Note that this is magical: if
you set it to the empty string (""), it will behave as if
you had set it to two newlines ("\n\n"), with this
exception: two or more blank lines in a row will be compressed into
one blank line. This makes it easy to read files one paragraph at a
time.
$|
If set to a nonzero value,
forces a flush every time you write to the currently-selected
filehandle.
$,
Output field separator. When
you print several items, separated by commas, Perl inserts the value
of $, between each item.
$"
Like $,, but applies to arrays
interpolated into a double-quoted string.
$.
The current input line number
for the last filehandle that was read from.
$=
The number of lines per page on
the currently-selected output channel.
$-
The number of lines left on the
current output page.
$%
The current page number of the
currently-selected output channel.
$~
The name of the current format
for the currently-selected output channel.
$^
The name of the current
top-of-page format for the currently-selected output channel.
$:
A string containing the
characters after which it is okay to break a long line in a format,
and start filling in continuation (^) fields. This is "\n-"
by default.
$^L
The string that formats should
output to produce a form feed. This is "\f" by default.
$^A
The current value of the write
accumulator for format lines. See perlform(1) and perlfunc(1) for
details.
$^I
The current value of the
inplace-edit extension. If Perl is running with the -i command-line
option, but no backup extension specified, $^I will be the empty
string. If the -i option was not specified, $^I has the undefined
value.
Subroutines
(Functions)
Introduction
http://www.perldoc.com/perl5.8.0/pod/perlsub.html
A
subroutine may be declared as follows:
sub NAME { ... }
and
called as:
NAME(arg1, arg2, ...); ## Arguments are optional!
## or
&NAME(arg1, arg2, ...);
Any
arguments passed to the routine come in as array @_. So to get
“arg1” from the above example you would do something like
this:
sub NAME {
my $arg1 = $_[0];
## Or you could do this instead (better)
(my $arg1, my $arg2) = @_;
}
The
return value of the subroutine is the value of the last expression
evaluated, and can be either an array value or a scalar value.
Alternately (preferably), a return statement may be used to specify
the returned value and exit the subroutine.
sub NAME {
my ($arg1, my $arg2) = @_;
my $answer = $arg1 * $arg2;
return($answer);
}
You
can define functions wherever you like: the Perl compiler will find
them during the compilation phase, and make them available to your
code by the time the body of the program is executed. You don't have
to worry about defining functions before calling them.
Example
functions
## Function add, adds two numbers and returns the new number
sub add {
(my $number1, my $number2) = @_;
my $result = $number1 + $number2;
return($result);
}
Here is a real subroutine
taken from one of my programs:
###############################################################################################
## FUNCTION:
## openLogFile ( $filename )
##
##
## DESCRIPTION:
## Opens the file $filename and attaches it to the filehandle "LOGFILE". Returns 0
## on success and non-zero on failure. Any generated error message will get set in
## global variable $!.
##
##
## Example:
## openFile ("/var/log/scanAlert.log");
##
###############################################################################################
sub openLogFile {
## Get the incoming filename
my $filename = $_[0];
## Make sure our file exists, and if the file doesn't exist then create it
if ( ! -f $filename ) {
printmsg("NOTICE: The file [$filename] does not exist. Creating it now with mode [0600].", 0);
open (LOGFILE, ">>$filename");
close LOGFILE;
chmod (0600, $filename);
}
## Now open the file and attach it to a filehandle
open (LOGFILE,">>$filename") or return (1);
## Put the file into non-buffering mode
select LOGFILE;
$| = 1;
select STDOUT;
## Tell the rest of the program that we can log now
$conf{'logging'} = "yes";
## Return success
return(0);
}
Notice the comments at
the top of the function - this is extremely important!
Built-in Functions
Homework
Create a script that can
read from and write to a file. The actual opening and closing of
files should be done in one or more separate subroutines.